C++: Support definition by reference in data flow library#1000
Merged
Conversation
This enables data flow through `memcpy` and similar functions modeled in `semmle.code.cpp.model`.
Contributor
|
Looks promising. |
This accessor may not be forward-compatible with an IR-based version, and it's unclear whether it has any use. The `VariableAccess` remains in the `TDefinitionByReferenceNode` constructor since it's used to implement `getType`.
This commits also adds a test that uses `getParameter`. The new tests demonstrate that support for array-to-pointer decay works, but we get data flow to the array rather than its contents.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The IR-based data flow library won’t become the default in 1.20, so I've added support for definition by reference to the current one. This PR doesn't support flow out of callers whose body assigns to a pointer parameter; that will still have to wait for the IR. What's supported now is flow through known library functions and manual specification of flow sources as
DefinitionByReferenceNodes. I've hooked data flow up to thesemmle.code.cpp.models.interfaces.DataFlowlibrary, which immediately gives us flow throughmemcpyand related functions.We need to find out whether this interface with
DefinitionByReferenceNodeis going to be forward-compatible with whatever is coming in the IR. We can always deprecate it, but it should at least allow feature parity between the current library and the IR-based one.Before this PR, data flow around definition by reference was different depending on whether a variable was modelled in
FlowVar.qllasSsaVarorBlockVar. It's supposed to be an implementation detail that variables can be modelled in these two different ways. Essentially,SsaVar-modelled variables would stop their data flow propagation atf(&x), whileBlockVar-modelled variables wouldn't. I've changedBlockVarto do whatSsaVardoes.The implementation was a bit tricky because it added cases to
FlowVarwhere a control-flow node is simultaneously a definition and a use, similar to what I did in https://git.semmle.com/Semmle/code/pull/23501.